Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
vanBerkel, Kees; Ciabattoni, Agata; Horty, John (Ed.)Markov Decision Processes (MDPs) are the most common model for decision making under uncertainty in the Machine Learning community. An MDP captures nondeterminism, probabilistic uncertainty, and an explicit model of action. A Reinforcement Learning (RL) agent learns to act in an MDP by maximizing a utility function. This paper considers the problem of learning a decision policy that maximizes utility subject to satisfying a constraint expressed in deontic logic. In this setup, the utility captures the agent’s mission - such as going quickly from A to B. The deontic formula represents (ethical, social, situational) constraints on how the agent might achieve its mission by prohibiting classes of behaviors. We use the logic of Expected Act Utilitarianism, a probabilistic stit logic that can be interpreted over controlled MDPs. We develop a variation on policy improvement, and show that it reaches a constrained local maximum of the mission utility. Given that in stit logic, an agent’s duty is derived from value maximization, this can be seen as a way of acting to simultaneously maximize two value functions, one of which is implicit, in a bi-level structure. We illustrate these results with experiments on sample MDPs.more » « lessFree, publicly-accessible full text available July 1, 2026
-
Free, publicly-accessible full text available July 1, 2026
-
We present algorithms for Cyber-Physical Systems (CPS) falsification and control, which take advantage of knowing the entire language of the temporal logic specification - that is, the set of signals that satisfy the formula. In the design of CPS, falsification and control play key roles. Falsification is a testing task, where the goal is to find an input signal that causes the system's output trajectory to violate the correctness requirements. Control is the dual task, where the goal is to find an input signal that causes the system's output to satisfy the specification. When the specification is expressed in a temporal logic, most existing work relies on local optimization heuristics to perform both tasks. In this paper, we explore whether a different expression of the specification offers advantages when performing falsification and control. Recent work presented a method for computing a representation of the language of a formula in (discrete-time) Signal Temporal Logic (STL), showing that the language can be represented as a union of polytopes. We introduce new falsification algorithms which combine distance information to the different components of the language to accelerate the convergence to a falsifier. And we introduce a new algorithm for computing a satisfying control signal which works by repeatedly projecting violating output trajectories back onto the language's components. Moreover, these algorithms are trivially parallelizable to take advantage of multiple processors. Despite their relative simplicity, our algorithms demonstrate 10x to 100x speedups relative to the state-of-the-art.more » « lessFree, publicly-accessible full text available May 6, 2026
-
When designing agents for operation in uncertain environments, designers need tools to automatically reason about what agents ought to do, how that conflicts with what is actually happening, and how a policy might be modified to remove the conflict.These obligations include ethical and social obligations, permissions and prohibitions, which constrain how the agent achieves its mission and executes its policy.We propose a new deontic logic, Expected Act Utilitarian deontic logic, for enabling this reasoning at design time: for specifying and verifying the agent's strategic obligations, then modifying its policy from a reference policy to meet those obligations.Unlike approaches that work at the reward level, working at the logical level increases the transparency of the trade-offs.We introduce two algorithms: one for model-checking whether an RL agent has the right strategic obligations, and one for modifying a reference decision policy to make it meet obligations expressed in our logic.We illustrate our algorithms on DAC-MDPs which accurately abstract neural decision policies, and on toy gridworld environments.more » « less
-
Urban Air Mobility, the scenario where hundreds of manned and Unmanned Aircraft Systems (UASs) carry out a wide variety of missions (e.g., moving humans and goods within the city), is gaining acceptance as a transportation solution of the future. One of the key requirements for this to happen is safely managing the air traffic in these urban airspaces. Due to the expected density of the airspace, this requires fast autonomous solutions that can be deployed online. We propose Learning-‘N-Flying (LNF), a multi-UAS Collision Avoidance (CA) framework. It is decentralized, works on the fly, and allows autonomous Unmanned Aircraft System (UAS)s managed by different operators to safely carry out complex missions, represented using Signal Temporal Logic, in a shared airspace. We initially formulate the problem of predictive collision avoidance for two UASs as a mixed-integer linear program, and show that it is intractable to solve online. Instead, we first develop Learning-to-Fly (L2F) by combining (1) learning-based decision-making and (2) decentralized convex optimization-based control. LNF extends L2F to cases where there are more than two UASs on a collision path. Through extensive simulations, we show that our method can run online (computation time in the order of milliseconds) and under certain assumptions has failure rates of less than 1% in the worst case, improving to near 0% in more relaxed operations. We show the applicability of our scheme to a wide variety of settings through multiple case studies.more » « less
An official website of the United States government
